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Abstract 

A complex network is a useful tool for representing and analyzing complex systems, such as the world-wide web and 
transportation systems. However, the growing size of complex networks is becoming an obstacle to the understanding of 
the topological structure and their characteristics. In this study, a globally and locally adaptive network backbone (GLANB) 
extraction method is proposed. The GLANB method uses the involvement of links in shortest paths and a statistical 
hypothesis to evaluate the statistical importance of the links; then it extracts the backbone, based on the statistical 
importance, from the network by filtering the less important links and preserving the more important links; the result is an 
extracted subnetwork with fewer links and nodes. The GLANB determines the importance of the links by synthetically 
considering the topological structure, the weights of the links and the degrees of the nodes. The links that have a small 
weight but are important from the view of topological structure are not belittled. The GLANB method can be applied to all 
types of networks regardless of whether they are weighted or unweighted and regardless of whether they are directed or 
undirected. The experiments on four real networks show that the link importance distribution given by the GLANB method 
has a bimodal shape, which gives a robust classification of the links; moreover, the GLANB method tends to put the nodes 
that are identified as the core of the network by the k-shell algorithm into the backbone. This method can help us to 
understand the structure of the networks better, to determine what links are important for transferring information, and to 
express the network by a backbone easily. 
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Introduction 

In recent years, complex networks have been investigated by 
scholars in many domains. The representation, analysis and 
modehng in complex network theory bring a new paradigm to 
research on some complex systems including the Internet, 
transportation systems, biological systems, and social systems [1]. 
One of the primary aims of complex network research is to reveal 
the structural characteristics of complex systems. Many emerging 
concepts, such as the smaU-world property [2] , scale-free behavior 
[3], community structure [4], and fractality [5], form the basis of 
our understanding of complex network structure. Because the 
scales of networks are becoming larger, a more intuitive and 
efficient method is required to represent and analyze the complex 
networks. Reducing a large-scale network to an essential backbone 
can help to solve the conflicts between the large scale of the 
complex networks and the understanding of the network structure. 
The backbone of a network is a core component that is extracted 
by filtering redundant information from the network and 
preserving far fewer links and nodes from the original network. 

The filtering methods for backbone extraction can be divided 
into two main categories: global methods and local methods. Some 
global methods use certain global measures to filter the links, such 



as the link betweenness-based method [6] and the link weight- 
based method [7] . These methods apply a global threshold on the 
weights or the betweenness of links in such a way that only those 
that exceed the threshold are preserved. These filters have been 
used in the study of functional networks that connect correlated 
human brain sites [8] , food web resistance as a function of link 
magnitude [9], and mobile communications networks [7]. The 
link weight-based method, however, could neglect nodes that have 
a small strength (The strength of node i is defined as 
Si= "y^^j^^ Wij, where U',y is the weight of the link (ij) and X/ is 

the set of neighbors of node ij because the introduction of a 
threshold induces a characteristics scale from the outside [10]. 

The link salience [11], another type of global method, defines 
the shortest-path tree T(r) that summarizes the shortest paths 
from a reference node r to the remainder of the network and that 
is conveniently represented by a symmetric N x N matrix {N is the 
number of nodes in the network) that has the element tij[r) = 1 if 
the link (iJ) is part of at least one of the shortest paths and 
tii{r) =0 if it is not. The central idea of the approach is based on 
the notion of the average shortest-path tree that is defined as 
I ^ 

5'=<r>= — ^ r(f). The element 0<5;/<l of the matrix S 
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quantifies the fraction of the shortest-path trees that the link {ij) 
participates in and denotes the salience of the link {ij). link 
salience is a robust approach to classifying network elements 
because the distribution of s, the link salience, exhibits a 
characteristic bimodal shape on the unit interval in many kinds 
of networks [11]. Link salience, however, tends to give an higher 
evaluation to the links being adjacent to low-degree nodes that 
often lie in the periphery of networks than the links being adjacent 
to high-degree nodes. For example, in Figure 1, link {i.p) is a part 
of the shortest-path tree Tir) for all of the reference nodes, i.e., 
Sip = \, because {i,p) is the only path that connects node p to the 
remainder of the network. Thus, link (/,/)) is always a part of the 
backbone extracted by the link salience method even though the 
Unk is meaningful only for node ^to transfer information between 
it and the rest of the nodes. 

The local methods use local measures to determine which links 
must be filtered, such as the disparity filter method [10] and the 
locally adaptive network sparsification (LANS) [12]. The disparity 
filter method introduces the normalized weight that corresponds to 
link {iJ) of a certain node / of degree fc,- and is defined as 
Pij = Wy/.?,-, where vv,y is the weight of the link, is the strength of 
node i. The normalized weight is assumed to be produced by a 
random assignment from a uniform distribution; thus, the 
probability density function of py is assumed to be 
f{x;ki) = {ki — \){\—x)^' ^. The backbone will include those 
links whose normalized weights satisfy the relation 

aij = \ — {ki — \)\ (1— x)' dx<a or 
Jo 

ePji 

aji=\ — {kj — \) {\—x)' dx<a, where a is a specified 
Jo 

significance level. Here «// and a,-,- denote significance of the link's 
normalized weight not following the uniform distribution. The 
local heterogeneity (Section 3.1) of a fink's weight is the premise of 
the disparity filtering method [10]. 

The LANS method, for each node ( and for any of its neighbors 
7, considers the fraction of non-zero links whose weights are less 

tiian or equal to Py, ^(Py) = titt IND{;7;m <;?;,•}, where 

IND{} is the indicator function, | is the number of neighbors of 
node /, and py is the normalized weight of link (ij). If 1 —F{pij) is 
less than a predetermined significance level a, the link (ij) is 
locally significant and is included in the backbone network. 




Figure 1 . An undirected artificial network. The first number on the 
line is the value of the link weight, and the second number is the value 
of the link salience. Although the link {i,p) gets the largest value 1 of 
the link salience, it is only important for node p. The links {i,k) and {j,k) 
have the smallest value of the link salience, but they are in the core of 
the network. 

doi:l 0.1 371/journal.pone.01 00428.g001 



Although both of the local methods do not befittie some links that 
have small weights from a global view by considering the 
importance of the links in each specific node, we argue that they 
could ignore some links that have small weights with respect to the 
topological aspect. They assume that, for a certain node, its 
neighboring links (the links that connect to the node) with larger 
weights are more important. In many cases, however, local and 
global topological structures of a link determine how important the 
link is. For example, in Figure 2, although the weight of link [ij) is 
greater than that of link {i,k), link {i,k) is more important than 
link {ij) for node / because {i,k) is the path through which i can 
reach most of the other nodes. From the prospective of 
information transfer, link {i,k) can help node / send or receive 
information more effectively than link {ij) can, because deleting 
link {i,k) could cause more damage than deleting link {ij) for the 
information transfer of the network. 

Because the local and global methods have advantages and 
disadvantages, in this study we aim to design a backbone 
extraction method that accounts for both the global and local 
topological structure of the networks. And the importance of links 
is synthetically determined by the weights of the links, the degree 
of the nodes, and the topological structure. The results of 
experiments on some real networks show that our propose method 
has some good characteristics. 

Materials and Methods 

In this study, w(" are inspir("d by th(" ideas of link salience and 
the disparity filter to propose a globally and locally adaptive 
network backbone (GLANB) extraction method. First, for each 
specific node, we compute the involvement of its neighboring links, 
which measures the fraction of the short paths connecting the node 
to the remainder of the network, which the links participate in. 
Second, we use a null hypothesis to determine whether each link is 
statistically important based on its involvement. 

2.1 Link Involvement 

We first consider a weighted, undirected and connected 
network. We define the length of the link {ij) as dij = l/ Wy, with 
Wij being the weight of link {ij), which is consistent with definition 
of the link length in the link salience method. In most networks the 
link weights denote the connection strength between nodes. For 
example, in social networks the link weights often denote the 
communication frequency between people. Thus, we assume that 
the links with high weights are important in our case, and we 
invert the weights to compute the link length that measures the 
distance between nodes. In practice, the formula of measuring link 




Figure 2. An undirected artificial network. The numbers on the 
lines denote the weights of the links. Although the weight of link (y ) is 
greater than that of link {i,k), link {i,k) is more important for node ; 
than link (iJ) is, because link {i,k) is the only path through which node 
i can reach the remainder of the network. 
doi:1 0.1 371/journal.pone.01 00428.g002 
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length should depend on the meaning of the weights. The length of 
a path that connects two terminal nodes («i,«r) and that consists 
otT—l links by a sequence of intermediate nodes and the link 

weight w„in,+,>0 is defined as ^= <4,«,+i- The shortest 

path minimizes the total distance / and can be interpreted as the 
most efficient route between its terminal nodes. The involvement 
Ifj of link (i j) is defined as 



1 »r n^'J') 



(1) 



where N is the number of nodes in the network; gf/' is the number 
of shortest paths between node i and .S" that pass through the link 
(ij); and g",., is the total number of shortest paths between node / 
and s. The involvement ly denotes how much the link (iJ) is 
involved in the most efficient connections between node i and the 
other nodes; thus, it can be a measure of the importance of link 
(iJ) for node / in the view of information transfer between node ; 
and the remainder of the network. The larger the value of /,y is, the 
more important link (iJ) is for node We can see that 



: 1 , where K, is the set of neighbors of node /. 



The involvement is different from the betweenness centrality. 
The betweenness of link (iJ) depends on the shortest paths 
between all pairs of nodes, but the involvement ly only depends on 
the shortest paths between the node / and the rest of the nodes 
since the definition of involvement /;/ is based on the idea what 
proportion of the rest of the nodes can connect the node i through 
the link (iJ). The involvement is also diflferent from the salience 
because the involvement considers the multiple shortest paths 
between each pair of nodes, but the salience assumes that only one 
shortest path exists between a pair of nodes. That is why the 
GLANB can also be applied to unweighted networks that often 
have multiple shortest paths between each pair of nodes. 

2.2 Statistical Importance {SI) of Linl<s 

We find that the involvement of links that are around a single 
node is distributed heterogeneously (see Section 3.1). We are 
interested in the links that have a significant involvement at each 
given node. However, the local heterogeneity of involvement could 
simply be produced by random fluctuations. Similar to the 
disparity filter method, we adopt a nuU model to compute the 
random expectation for the distribution of the involvements that is 
associated with the links of a certain node. Th(- null h)rpothesis is 
that the involvement / that corresponds to a connection of a 
certain node of degree k is produced by a random assignment 
from a probability density function o{f{x;k). Because the links 
that are adjacent to a certain node with the degree of k should 
have the same chance under the random condition to connect the 
node to the remainder of the network, the mean of the 
involvement must satisfy the condition 



that has a mean of l/k and a variance of -j- ~ -j^ /{N — \). 

Alternatively, we can assume that the involvement obeys the 
power law distribution f{x) = jix' because for most complex 
networks, the degree and weight have been verified to follow 
power law distributions [1,3]. It is easy to obtain the probability 

1 _i=2 

density function f{x;k) = - — -x '-i,A;>2. Moreover, the in- 
volvement can be assumed to follow a uniform distribution, which 
is similar to what the disparity filter method has performed for the 
normalized weights of the finks [10] and has the probability 
density function of/(x; k) = (A:— 1)(1 — x)*^^. 

The GLANB measures the statistical importance Sly of fink 
(iJ) by using a null model to calculate the probability in such a 
way that its involvement is compatible with the nuU hypothesis. 
The statistical importance Sly of link {ij) is defined as 



SIy = l— f{x;ki)dx,ki>2, 
Jo 



(3) 



where ki is the degree of node i. In this study, the involvement is 
assumed to follow a uniform distribution, i.e., 

f{x;ki) = {ki-l){l-xf''^; tiius, SIy = {l-Iy)'"''\ki>2. To 
control the impact of the degree on the statistical importance, we 
add a parameter c>0 to the formula, as follows: 



SIy={l-Iij)^'- ')',fc,>2. 



(4) 



If c = 0, then the statistical importance Sly is determined only by 
and is not affected directly by the degree (lij can be affected 
indirectly by because the shortest paths to node / are affected by 
kj). As c increases, the impact of the degree becomes larger. The 
experimental results show some interesting characteristics of the 
GLANB method under different values of c (see Section 3). 

The smaller the value of Sly is, the more significantly the link 
(iJ) is not compatible with a random distribution; furthermore, 
the link {iJ) can be considered more important due to the 
network-organizing principles. The final statistical importance of 
an undirected link (ij) is the minimum of Sly and Slji. In the case 
when a node / of degree kj = 1 is connected to a node j of degree 
kj>l, the statistical importance of link (ij) is 5/,,. The GLANB 
can identify a backbone of a network by setting the significance 
level a for the SI (a link is included in the backbone if its SI is less 
than a) based on the distribution of 5/ (see Section 3.3), or identify 
the hierarchical backbones by setting different significance levels 
since the backbone under high significance level will contain the 
backbone under low significance le\cl. The backbone includes the 
finks that are statistically important according to the specified 
significance level and their terminal nodes. 



E{x; k)=^,k> 2andxe[0, 1] . 

K 



(2) 



Many probability density functions satisfy this condition and 
can be used to generate an involvement that is random and is 
based on specific assumptions. For example, if we assume that for 
each specific node that has the degree of k, its neighboring links 
independently participate in the shortest paths between the node 
and the remainder of the network with a probability of 1/A:; then, 
the involvement I obeys approximately the normal distribution 



2.3 Unweigtited and Directed Networlcs 

The GLANB method can be easily applied in unweighted 
networks. In this case, the weights of all of the links are treated as 
being equal; thus, the length of a path is the number of links that 

lie in the path. 

To be appUed in directed networks, the GLANB must be 
modified. The directed link (iJ) from starting node / to ending 
node j is either an out-link for node / or an in-link for node j. 

Thus, we define the out (in) involvement ijj'"'^ (ly"^) of the directed 
fink (iJ) separately as 
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Figure 3. The local heterogeneity of the involvement and the normalized weight in four real networics. Each point in the figure denotes 
a node in the network. The local heterogeneity of the involvement for node ; is defined as =^'< ^ g^^. Ay' where K, is the set of neighbors of 
node ^', is the degree of node /, and /,y is the involvement of link (ij). The local heterogeneity of the r^dr'malized weight for node / is defined as 

yj(ki) = ki y~\^^. [wij/sif', where .s, = iv// is the strength of node /. We can find that for all of the networks, the involvement is locally more 

heterogeneous than the normalized weight is. 
doi:1 0.1 371 /journal.pone.01 00428.g003 



Jout) _ 



|N<""''| 



J.'J) 



and 



E 



(V) 

OS] 



where X 



[out) 



is the set of nodes that can be reached from node i 
through a directed path, and Xj'"' is the set of nodes tliat can reach 
node j tlirough a directed path; denotes the size of X,-""''; 

g^lf' is the number of shortest paths from node i to s that pass 
through the hnk [ij); and gts is the total number of shortest patlis 
from node / to s. The involvement measures how much the 
link [iJ) is involved in the shortest paths from node i to the other 
nodes, and /^"'' measures how much the link (;' j) is involved in the 
shortest paths from the other nodes to node j. 

The statistical importance of link (ij) is composed of two parts. 



the in-importance S/^'"' and the out-importance SI}""'' , which are 
defined from the viewpoint of the starting node / and the ending- 
node j separately as 



j{0M) 



s4''"" = (l-7^"'> 



SI 



{in) . 



1-/, 



(in) 



,k''r'^>2 and 



>2 



where fc-""'' is the out-degree of node i, k^^"^ is the in-degree of 
node j, and c is the control parameter. The final statistical 
importance of the directed link [ij) is determined by the 



minimum of Sl\°"'^ and SI]'"' . Similar to the case of weighted 
and undirected networks, the GLANB can identify a backbone 

(5) from unweighted or directed network by setting a significance level 
for SI based on the distribution of SI, or a hierarchical backbone 
by setting different significance levels for SI. 

Results 

To test the performance of the GLANB method, we apply it to 
four real-world networks, a collaboration network (coauthor) [13], 
an instant-message network (fetion), an email network (email) [14] 
and an airport traffic network (airport). We compare the obtained 
results with those obtained by the disparity filtering method and 
the link salience method. (1) The collaboration network is based on 
co-authorship of academic papers in the high-energy physics 
community from 1995-1999. Nodes represent individuals, and 
links measure the number of papers that were co-authored. The 
data are publicly available at http:/ /www-personal.umich.edu/ 
~mejn/netdata/. (2) The instant-message network is based on an 
instant-message tool, fetion, which is provided by Mobile 
Corporate. The nodes represent fetion users, and the links 
measure the number of messages sent between each pair of users. 
(3) The email network is an undirected and unweighted network. 
The nodes represent email users, and the links represent whether 
any communication exists between each pair of users. The email 
network data are available at http://deim.urv.cat/~aarenas/ 

(6) data/welcome.htm. (4) The airport traffic network is a weighted 
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Figure 4. Fraction of nodes maintained in tlie bacl<bones. The fraction of nodes is a function of the fraction of links retained by the filters. The 
dash lines correspond to the fraction of the nodes whose degree is greater than 1 in the networks. 
doi:1 0.1 371/journal.pone.01 00428.g004 



and directed network. It measures global air traffic that is based on 
flight data that is provided by OAG Worldwide Ltd. (http:/ /www. 
oag.com), and it includes all of the scheduled commercial flights in 
the world in 20 11. The nodes represent airports worldwide. The 
link weights measure the total number of passengers that travel 
between a pair of airports by direct flights per year. This network 
is well represented in the literature [15,16,17]. In the experiments 
only the largest connected subnetworks of each of the networks are 
used. The backbone includes the links that are significantly 
important according to the extraction methods and their terminal 
nodes. Because the authors in [1 1] do not mention how the 
salience method deals with the directed or unweighted networks, 
we do not apply the salience method to the email and the airport 
networks. 

3.1 Local Heterogeneity of the Link Involvement 

The condition under which the null model can perform well is 
that for each node, its links' involvement shows heterogeneity. If 
this condition is not satisfied, tiien it is difficult to identify the 
important links through the GLANB method. To assess the effect 
of heterogeneities in the links' involvements at the local level, for 
each node i of degree k-,, one can calculate the function [18,19] 

y,(k;)=k,Y,(k^=k,Y.'l (7) 

where X,- is the set of neighbors of node i and Ijj is the involvement 
of link {ij). 

As a standard indicator of measuring the concentration of data, 
the function F,(fc,) has been extensively used in various domains, 
including ecology, economics, physics, and complex networks 
[10,19], where it is known as the disparity measure. Under perfect 
homogeneity, when all of the links share the same amount of the 
involvement of node /(i.e., ly = 1 /ki), yi(ki) equals 1 independentiy 



of ki, while in the case of perfect heterogeneity, when only one of 
the links carries the whole involvement of the node, the function is 
yi{kj) =ki. In this way, this function can be used as a preliminary 
indicator of the presence of local heterogeneity. When local 
heterogeneity of involvements exists, the GLANB can be more 
useful than in the case of homogeneity because the GLANB aims 
to identify the links whose involvements are significantly higher 
than other neighboring links'. To compare the involvement with 
the weights of the links, we also compute the heterogeneity of the 
normalized weights [10] by 

,,eK, V / 

where 5, = . ^ Wy is the strength of node i. Figure 3 shows the 

local heterogeneity of the involvement and the normalized weight 
in the coauthor, fetion, email, and airport networks. We can find 
that for all of the networks, the involvement is locally more 
heterogeneous than the normalized weight is (Figure 3). These 
results indicate that applying the null model to the involvement 
can identify the statistically important links well. 

3.2 Size of the Backbones 

The main purpose of extracting backbones is to reduce the 
number of links in networks, while keeping more nodes. To 
measure the effects of these filtering methods on the extracted 
backbones, we analyze the relative sizes of the backbones as a 
function of the preserved fractions of the links when the network is 
filtered by the disparity filter, by the link salience and by the 
GLANB (Figure 4). 

For the four real networks, the link salience method can 
preserve the largest fraction of nodes in the backbone, and the 
disparity filter method preserves the smallest (except when the 
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fraction of links is less than 0.4 for the fetion network) when the 
same fraction of Unks is maintained. The results of the GLANB 
methods fall in between the disparity and the salience methods. 
We must note that for the salience method, all of the links that are 
adjacent to the nodes with a degree of 1 have the largest salience of 
1, and preserving these links can retain at least one node. Thus, 
the Knk salience method can preserve the largest fraction of the 
nodes when filtering the networks. 

We also find that in the backbone of the coauthor and fetion 
networks identified by the GLANB method at the specified values 
of control parameter c, the fraction of nodes stays aj)proximately 
unchanged for an interval of the fraction of links when the fraction 
of nodes reaches the threshold that is the fraction of nodes with a 
degree greater than 1. For the email networks, this phenomenon 
also exists when c = 0 or c = 1 . For the airport networks, this 
phenomenon exists when c = 0. The interval of keeping un- 
changed is the longest for all of the networks when c = 0 (Figure 4). 
The reason of the phenomenon is that for the nodes that have a 
degree of I, the value of 5/ of their neighboring links is very close 
to 1; thus, these nodes are difficult to include in the backbone 
when the fraction of links in the backbone is not sufficiendy large. 
Moreover, as the control parameter c increases, the growth curves 
of the fraction of nodes become relatively flat (Figure 4), because 
high value of c prefers the links that correspond to the nodes that 
have a high degree, and tlu-sc links have a low value of SI. 
Preserving these links in the backbone cannot increase the fraction 
of nodes proportionally because some other links that could have 
been preserved in the backbone are more likely to share the same 
terminal nodes with them. Thus, these results indicate that the 
parameter c can control the size of extracted backbone by 
impacting the degrees of the nodes on the value of the 
involvement. 

3.3 Robust Classification of Links Based on the Statistical 
Importance 

Similar to the link salience measure [1 1], the surprising feature 
of the statistical importance SI is that the distribution p{SI) 
exhibits a characteristic bimodal shape on the unit inter\'al 
(Figure 5). The networks' links naturally accumulate at the 
boundaries and have a small fraction at intermediate values. The 
statistical importance thus successfully classifies network links into 
two groups: important {SI kO) or non-important {Six 1). Because 
a small fraction of links fall into the intermediate range, the 
resulting classification is not significantly sensitive to an imposed 
threshold. This circumstance is fundamentally different from some 
link centrality measures, such as weight and betweenness, which 
possess broad distributions and which require external and often 
arbitrary threshold parameters to perform meaningful classifica- 
tions. The distribution of links' statistical relevance when measured 
by the disparity filter method shows a unimodal shape in the 
coauthor network or a flat distribution in the fetion network 
(Figure 5), which has the result that choosing the appropriate 
significance level cc to filter links becomes difficult. For the GLANB 
method, as the control parameter c increases, the number of links 
with high importance increases (Figure 5). The reason is that the 
GLANB method with a high value of c favors the links that 
correspond to the nodes that have the degree fc> 1, and these links 
occupy a large proportion of total links. 

3.4 K-shell distribution of links 

To deeply explore the hierarchy of links in the backbones that 
are extracted by the GLANB, disparit)' filter and salience methods, 
we use the k-sheU decomposition method to compare the 
topological distribution of extracted links. The k-sheU decompo- 



sition method is often used to identify the core and the periphery 
of the networks [20,21]. Although the k-sheU method only takes 
into account the nodes' degree not the link weights, it provides a 
way to compare the backbone extraction methods from the view of 
topological structure. The process of the k-shell decomposition 
starts by removing all of the nodes that have one link (degree 1) 
only, until no more such nodes remain; then, it assigns them to the 
1 -shell. In the same manner, it recursively removes all of the nodes 
that ha\-c a degree of 2 (or less), creating the 2-shell. This process 
continu(-s, increasing k until all of the nodes in the network have 
been assigned to one of the shells. The shells that have high indices 
lie in the core of the network. To assign all of the links to the shells, 
we define the shell index of a link as the minimum of its two 
terminal nodes' shell indices. 

For the coauthor, fetion and email networks, we extract the top 
lO'^ii important links based on the SIij of GLANB (from low to 
high), the ay of disparity filter (from low to high) and the Sy of 
salience methods (from high to low) separately to analyze their 
distributions in terms of link-shells. Because the salience method 
ranks the links for which one terminal node has the degree of 1 as 
most important, and because both the disparit)' filter and the 
GLANB methods rank them as least important, we also exclude 
these links to extract the remaining top 10% important links based 
on the saUence method (sahence-E) to analyze the distribution. 
The distributions of the links in the range of the shell index are 
shown in Figure 6. We can see that compared with the disparity 
and salience methods, the GLANB (c = 2) extracts more links that 
lie in the higher shells, i.e., the topological core of the networks. 
Especially for the salience method, most of the extracted finks lie in 
the lower shells. This circumstance occurs because the links whose 
terminal nodes have a low degree tend to have a high salience. For 
example, the links that are adjacent to the nodes that have the 
degree of 1 have the highest salience of 1 , which means that all of 
the links in the 1 -shell are certain to be in the backbone that is 
extracted by the link salience method. For the salience-E method, 
most of the links still fall in the low shells, and the distribution 
almost coincides with that of the GLANB (c = 0), which ignores the 
degree of the corresponding nodes in a similar way as the salience 
method. As the control parameter c increases, more links fall into 
the higher link-shells. 

There are two reasons to explain why the GLANB (oO) is 
more likely to extract finks from the topological core of the 
networks than the other methods. One reason is that the backbone 
which is extracted by the GLANB (c > 0) method does not include 
the links that are adjacent to the nodes that have a degree of 1. 
The second reason is that the nuU model depends on the degrees 
of the nodes. When stays unchanged, increasing the value of 
degree ki can decrease the value of Sly in a power-law way (see 
formula 4). The larger the value of c is, the more greatiy kj affects 
Sly. Thus, some links that have a higher shell index would be in 
the backbone even though their involvement values are not very 
high. Furthermore, from Figure 3, we can see that the distribution 
of link involvements for the nodes that have a higher degree shows 
heterogeneity, which means that some Unks have both a high- 
degree terminal and a high involvement. 

Discussion 

The GLANB method accounts for both the global and local 
topology structure of the network when extracting the backbones. 
On the one hand, the involvement of each link is either a global 
measure (because it depends on the shortest paths that are 
determined by the global network structure and the link weights) 
or a local measure (because the sum of the involvements of the 
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Figure 5. The distributions of the linl< salience, the link statistical importance and the disparity filtering importance. Link 
measurement refers to the values of the link salience, link statistical importance, and the disparity filtering importance that are given by the salience, 
GLANB and disparity methods separately. For the GLANB and disparity methods, the smaller values mean higher importance. For the salience 
method, the larger values mean higher importance. 
doi:1 0.1 371/journal.pone.01 00428.g005 



links that are adjacent to any certain node has the value of 1). On 
the other hand, the nuU model that is adopted in GLANB is based 
on a local view because the probability density function depends 
on the degree of each certain node. Thus, the GLANB determines 



the importance of the links by synthetically considering die 
topological structure, the weights of the links and the degrees of 
the nodes. In this method, the links that have a small weight but 
are important from the view of structure are not belitded. 




Figure 6. The distribution of links in link-shells. For the coauthor, fetion and email networks, we extract the top 10% important links, based on 
the GLANB, disparity filter and salience methods separately, to analyze their distributions in terms of link-shells. In addition, we also exclude the links 
that have degree of 1 to extract the remaining top 10% important links based on the salience method (salience-E) to analyze the distribution. 
doi:1 0.1 371/journal.pone.01 00428.g006 
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Furthermore, introducing tlie control parameter c into GLANB 
provides a way to adjust the impacts of the node degrees on the 
extracted backbones, which makes the backbone adaptive to the 
global structure and the local structure by changing the value of c. 
When c^O, the backbone mainly concentrates on the global 
structure. When the value of c becomes larger, the backbone is 
affected more gready by the local structure. Another advantage is 
that the GLANB method can be applied to all types of networks 
regardless of whether they are weighted or unweighted and 
regardless of whether they are directed or undirected. 

The computational complexity of the GLANB method is 
determined by the computation of the involvement and the 
statistical importance of the hnks. To compute the involvement, 
we must find all of the shortest paths between each pair of nodes, 
which results in the computational complexity being 
0{NL + N^\n{N)) [22], where N is the number of nodes and 
Lthe number of links in the network. The computation of the 
statistical importance must scan all of the links to compute the 
degrees of the nodes and the 5*/ of the links; thus, the 
computational complexity is 0(L). Because L<N^, the compu- 
tational complexity of GLANB is 0{NL + N^\n{N)). When die 
size of the network is very large, GLANB is not adaptable if it is 
executed on only a single computer. However, the computational 
environment has recently been changing dramatically. Parallel 
computation platforms are being used pervasively because of their 
low implementation costs and high performance. Because the 
GLANB method is based on each single node to measure the 
involvement and statistical importance of their neighboring links, it 
is easy to implement GLANB on a parallel platform. 

The experiments on the real-world networks show some 
interesting results. First, the link involvements show local 
heterogeneity that arises from the topological structure of the 
networks and from the heterogeneous weight distributions because 
the shortest paths are determined by those two aspects. Moreover, 
the involvement is more heterogeneous in the weighted network 
than in the unweighted network (Figure 3). Second, the link 
importance distribution, which shows a bimodal shape, gives a 
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